28 Feb 2021 - Constrained Optimization using Lagrange Multipliers

For the inaugural post on this new project, let's take a crack at learning constrained optimization using Lagrange multipliers.

The goal here isn't necessarily to teach others how to use this technique since there are plenty of other resources available. But rather, to demonstrate the use of the technique in a clean and communicative manner simply as an exercise for myself.

Today, we're going to use Lagrange multipliers to analytically find the extrema of a two-dimensional function, and then confirm our findings numerically.

Technique focus:

First, let's introduce the function we'll be investigating today. This example comes straight from the example section of the Wikipedia page on Lagrange multipliers.

$$ \begin{equation} \label{eq:f} f(x,y) = x^2 y \end{equation} $$

Next, we define a constraint on $f$,

$$ \begin{equation} \label{eq:g} g(x,y) = x^2 + y^2 - 3 = 0 \end{equation} $$

In English, our goal is to find the high and low points of $f$ along a circle that is centered on the origin and with a radius of $\sqrt{3}$.

As we see in eq. \eqref{eq:f}

First, let's form the Lagrangian,

$$ \begin{equation} L(x, y, \lambda) = f(x,y) + \lambda g(x,y) \end{equation} $$

And just to make things clear, let's expand it out, substituting in the equations for $f$ and $g$. We'll also distribute the $\lambda$.

$$ \begin{equation} L(x, y, \lambda) = x^2 y + \lambda x^2 + \lambda y^2 - 3\lambda \end{equation} $$

To find the extrema, we need to find values for our three unknowns $\left( x, y, \lambda \right)$. These three equations come from setting equal to zero the three independent partial derivatives of $L$ with respect to each of these variables. $$ \begin{align} \frac{\partial{L}}{\partial{x}} &= 2xy + 2 \lambda x = 0 \label{eq:L_by_x} \\ \frac{\partial{L}}{\partial{y}} &= x^2 + 2 \lambda y = 0 \label{eq:L_by_y} \\ \frac{\partial{L}}{\partial{\lambda}} &= x^2 + y^2 - 3 = 0 \label{eq:L_by_lambda} \end{align} $$

Starting with \eqref{eq:L_by_x},

$$ \begin{align} 2xy + 2 \lambda y &= 0 \nonumber \\ 2x \left( y + \lambda \right) &= 0 \nonumber \\ x \left( y + \lambda \right) &= 0 \end{align} $$

This tells us that either $x = 0$ or $y = -\lambda$.

First we'll plug $x = 0$ into $\eqref{eq:L_by_lambda}$, $$ \begin{align} x^2 + y^2 - 3 &= 0 \nonumber \\ y^2 - 3 &= 0 \nonumber \\ y^2 &= 3 \nonumber \\ y &= \pm \sqrt{3} \end{align} $$

That gets us $x$ and $y$, but since we're also interested in $\lambda$ we'll next plug $x=0$ and $y = \pm \sqrt{3}$ into \eqref{eq:L_by_y}, $$ \begin{align} x^2 + 2\lambda y &= 0 \nonumber \\ 2\left(\pm\sqrt{3}\right) \lambda &= 0 \nonumber \\ \lambda &= 0 \end{align} $$

This gets us our first set of critical points: $$ \boxed{ \left( 0, \sqrt{3}, 0 \right) \\ \left( 0, -\sqrt{3}, 0 \right) } $$

Next, if $y = -\lambda$ then $\lambda = -y$. Plugging that into $\eqref{eq:L_by_y}$,

$$ \begin{align} x^2 + 2\lambda y &= 0 \nonumber \\ x^2 - 2 y^2 &= 0 \nonumber \\ x^2 &= 2 y^2 \end{align} $$

And then, plugging $x^2 = 2 y^2$ into $\eqref{eq:L_by_lambda}$, $$ \begin{align} x^2 + y^2 - 3 &= 0 \nonumber \\ 2y^2 + y^2 - 3 &= 0 \nonumber \\ 3y^2 - 3 &= 0 \nonumber \\ 3y^2 &= 3 \nonumber \\ y^2 &= 1 \nonumber \\ y &= \pm \sqrt{1} = \pm 1 \end{align} $$

And since $\lambda = -y$, $\lambda = \pm 1$, though we must take care to always use the opposite sign as $y$. The result is that $\lambda y = -1$.

Plugging $y = 1$ and $\lambda = -1$, or $y = -1$ and $\lambda = 1$ into $\eqref{eq:L_by_y}$, $$ \begin{align} x^2 + 2 \lambda y &= 0 \nonumber \\ x^2 &= -2 \lambda y \nonumber \\ x^2 &= 2 \nonumber \\ x &= \pm \sqrt{2} \end{align} $$

Now we can take $x = \pm \sqrt{2}$, $y = \pm 1$, and $\lambda = \pm 1$ we get our remaining critical points. Including the first two we found, all our critical points are:

$$ \boxed{ \left( 0, \sqrt{3}, 0 \right) \\ \left( 0, -\sqrt{3}, 0 \right) \\ \left( \sqrt{2}, -1, 1 \right) \\ \left( -\sqrt{2}, -1, 1 \right) \\ \left( -\sqrt{2}, 1, -1 \right) \\ \left( \sqrt{2}, 1, -1 \right) } $$

Next, we evaluate $\eqref{eq:f}$ with the $x$ and $y$ values from the critical points.

$$ \begin{align} f \left(0, \sqrt{3} \right) &= 0 \\ f \left(0, -\sqrt{3} \right) &= 0 \\ f \left(\sqrt{2}, -1 \right) &= -2 \\ f \left(-\sqrt{2}, 1 \right) &= 2 \\ f \left(-\sqrt{2}, -1 \right) &= -2 \\ f \left(\sqrt{2}, 1 \right) &= 2 \end{align} $$

So we find maxima at $\left( \sqrt{2}, 1 \right)$ and $\left( -\sqrt{2}, 1 \right)$ and minima at $\left( \sqrt{2}, -1 \right)$ and $\left( -\sqrt{2}, -1 \right)$.

Now let's confirm that numerically.

First, let's define the function we're exploring.

Next, calculate some points along the circle that represents our constraint $g$. We can evaluate $f$ for each point along this circle and inspect the extrema.

f_on_g contains the value of $f$ when evaluated at each point we've calculated along our constraint circle $g$, starting at theta = 0 and moving around the circle counterclockwise.

Next, let's visualize $f$ with a 3D mesh just so we can take a look at it and get some feel for what we're working with.

Just eyeballing that plot, the results we found analytically seem plausible, but it's difficult to be sure. Let's try another method.

We can make a simple plot of the value of $f$ along $g$ by unwinding the circle and using theta (converted to degrees) as the x-axis.

As we'd expect from our earlier analysis, we see two global maxima and two global minima. But we also see a local minimum and local maximum we'll have to ingestigate.

To improve the visualization, let's add in the critical points that we found earlier, along with their $\lambda$ values. This will allow us to confirm that they line up with the observed extrema and might give us a clue as to the non-global maximum/minimum around 90° and 270°.

Comparing the above plot to our predictions that expect maxima at $\left( \sqrt{2}, 1 \right)$ and $\left( -\sqrt{2}, 1 \right)$ and minima at $\left( \sqrt{2}, -1 \right)$ and $\left( -\sqrt{2}, -1 \right)$, we see that the values we arrived at analytically agree with the values we arrived at numerically.

Additionally, the plot suggesets that the sign of $\lambda$ indicates if a point is a local maxima or minima, with negative values of $\lambda$ corresponding to local maxima, positive values of $\lambda$ corresponding to local minima, and $\lambda = 0$ corresponding to saddle points. But that will be a topic for another post.